Real world approaches for multilingual and non-native speech recognition
نویسنده
چکیده
This thesis proposes a scalable architecture for multilingual speech recognition on embedded devices. In theory multiple languages can be recognized just as one language. However, current state of the art speech recognition systems are based on statistical models with many parameters. Extending such models to multiple languages requires more resources. Therefore a lot of research in the area of multilingual speech recognition has proposed techniques to reduce this need for more resources through parameter tying across languages. After an evaluation of the previous work, this thesis was able to show that tying at the density level offers the greatest flexibility for the design of a multilingual acoustic model. Furthermore, there were also hints in the literature that densities from the native language of the speakers can be useful for the modeling of non-native accents of speakers. Based on these findings, this thesis developed an algorithm for the creation of Multilingual Weighted Codebooks (MWCs) that adds Gaussians from the spoken languages to the native language codebook (= set of Gaussians) of the speaker. A key advantage of this algorithm is that it optimally models the native language of the speaker, which is not the case for most of the previous work. The results prove the effectiveness of the MWC algorithm, both for native and non-native speech, but the disadvantage of this algorithm is that it increases the training effort exponentially with the number of languages considered. The answer to this problem was found in projections between Gaussian spaces. These projections allow to generate multilingual models within fractions of a second from monolingual speech recognizers. Due to this, the problem of training effort was eliminated, as there is no longer the need to provide all possible acoustic models. Instead, it is possible to determine the languages that are needed on the embedded system and to generate the required acoustic model online. Of course, this large reduction in time causes a reduction in performance, but a combination of the MWC algorithm and the on-the-fly creation of new models leads to a scalable architecture that can recognize all languages with good performance. At the same time, the target resources are almost independent of the number of languages. Finally, this thesis also compared several additional approaches for the optimal recognition of non-native accented speech. As the literature indicated, the use of the native language codebook of the speaker in the MWC algorithm already gave a significant improvement over monolingual systems. From the other tested algorithms, only the adaptation with additional non-native development data could outperform the baseline of native language codebooks.
منابع مشابه
Autonomous acoustic model adaptation for multilingual meeting transcription involving high- and low-resourced languages
In speech technology, we found several challenges in automatic speech transcription system for multilingual conferences or meetings. Firstly, the dialog occurs between native and non-native speakers. Secondly, the non-native speakers come from different parts of the world (e.g., English spoken by native French speakers or English spoken by native Vietnamese speakers, etc.). Thirdly, no data or ...
متن کاملRecognition of non-native German speech with multilingual recognizers
In this study we present di erent approaches to the recognition of non-natives. With a corpus in German spoken by speakers with 56 di erent rst languages, the Strange Corpus, we perform recognition experiments with monolingual and multilingual recognizers. Among other, we compared two German recognizers, one that was trained in addition with non-native (Italian) speech and the other trained wit...
متن کاملImproving ASR performance on non-native speech using multilingual and crosslingual information
This paper presents our latest investigation of automatic speech recognition (ASR) on non-native speech. We first report on a non-native speech corpus an extension of the GlobalPhone database which contains English with Bulgarian, Chinese, German and Indian accent and German with Chinese accent. In this case, English is the spoken language (L2) and Bulgarian, Chinese, German and Indian are the ...
متن کاملMultilingual Weighted Codebooks for Non-native Speech Recognition
In many embedded systems commands and other words in the user’s main language must be recognized with maximum accuracy, but it should be possible to use foreign names as they frequently occur in music titles or city names. Example systems with constrained resources are navigation systems, mobile phones and MP3 players. Speech recognizers on embedded systems are typically semi-continuous speech ...
متن کاملUnsupervised acoustic model adaptation for multi-origin non native ASR
To date, the performance of speech and language recognition systems is poor on non-native speech. The challenge for nonnative speech recognition is to maximize the accuracy of a speech recognition system when only a small amount of nonnative data is available. We report on the acoustic model adaptation for improving the recognition of non-native speech in English, French and Vietnamese, spoken ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010